-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add hash for N integers #135745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hash for N integers #135745
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| for (long index = slot;; index = nextSlot(index, mask)) { | ||
| final long curId = id(index); | ||
| if (curId == -1) { // means unset | ||
| setId(index, id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be unsetting the first match that is not -1? I may misunderstood this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I copy-pasted this from other classes. I think the issue is the method name reset. This method is called after unsetting (in removeAndAdd) and is used to re-add to an empty slot instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea Nhat! How much do you think this will improve queries with | STATS ... BY field1.. on single valued fields?
It depends on the number of grouping fields and rows. However, we need another BlockHash before we can see any improvement. |
|
Thanks @kkrik-es @martijnvg! |
PackedValuesBlockHash can be very slow with a large number of groupings. I plan to introduce a BlockHash that generates a hash for each column, then uses IntNHash to generate the final hash ID. If a multi-value block is detected, we can fall back to PackedValuesBlockHash.